nigerian prince
Mitigating Fine-tuning based Jailbreak Attack with Backdoor Enhanced Safety Alignment
Wang, Jiongxiao, Li, Jiazhao, Li, Yiquan, Qi, Xiangyu, Hu, Junjie, Li, Yixuan, McDaniel, Patrick, Chen, Muhao, Li, Bo, Xiao, Chaowei
Despite the general capabilities of Large Language Models (LLM), these models still request fine-tuning or adaptation with customized data when meeting specific business demands. However, this process inevitably introduces new threats, particularly against the Fine-tuning based Jailbreak Attack (FJAttack) under the setting of Language-Model-as-a-Service (LMaaS), where the model's safety has been significantly compromised by fine-tuning users' uploaded examples contain just a few harmful examples. Though potential defenses have been proposed that the service providers can integrate safety examples into the fine-tuning dataset to reduce safety issues, such approaches require incorporating a substantial amount of data, making it inefficient. To effectively defend against the FJAttack with limited safety examples under LMaaS, we propose the Backdoor Enhanced Safety Alignment method inspired by an analogy with the concept of backdoor attacks. In particular, service providers will construct prefixed safety examples with a secret prompt, acting as a "backdoor trigger". By integrating prefixed safety examples into the fine-tuning dataset, the subsequent fine-tuning process effectively acts as the "backdoor attack", establishing a strong correlation between the secret prompt and safety generations. Consequently, safe responses are ensured once service providers prepend this secret prompt ahead of any user input during inference. Our comprehensive experiments demonstrate that through the Backdoor Enhanced Safety Alignment with adding as few as 11 prefixed safety examples, the maliciously fine-tuned LLMs will achieve similar safety performance as the original aligned models without harming the benign performance. Furthermore, we also present the effectiveness of our method in a more practical setting where the fine-tuning data consists of both FJAttack examples and the fine-tuning task data.
- North America > United States > Wisconsin > Dane County > Madison (0.04)
- North America > United States > Michigan > Washtenaw County > Ann Arbor (0.04)
- North America > United States > Illinois > Cook County > Chicago (0.04)
- (2 more...)
- Information Technology > Security & Privacy (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
The Long Shadow of the 'Nigerian Prince' Scam
In November 2021, Oluwaseun Medayedupin was arrested by the Nigerian police in Lagos. An investigation found that he had been pursuing "disgruntled employees" from American companies and pushing them to release ransomware on internal enterprise servers, offering a percentage of the cut if they agreed to collaborate in the attack. This was a sophisticated social engineering scheme, far more advanced than the notorious "Nigerian prince" emails that have made the country of Nigeria synonymous with scams. The origins of these types of scams may be attributed to a boom in the establishment of cybercafes during the 1990s, coinciding with falling oil prices in Nigeria and a rise in unemployment. Add in a lack of national social security, and many Nigerians were forced to seek out alternative forms of employment--physical labor; gig work; and, most notoriously, cybercrime.
- North America > United States (0.31)
- Africa > Nigeria > Oyo State > Ibadan (0.05)
Five Ways You're Already Using Machine Learning: A Day with AI - insideBIGDATA
In this special guest feature, Mark Scott, CMO at Apixio, highlights the prevalence of machine learning in everyday life and offers five ways you're (probably) already using machine learning all without you realizing or thinking about it. Mark has more than 19 years of medical technology and health care provider marketing experience. His expertise covers all the bases--from brand development, positioning and messaging; to brand identity, packaging and labeling; public relations; content marketing, website development; internal/employee communications; and global brand-launch activations. Mark has a Bachelors and a Masters Degree from the University of Western Ontario. "Machine learning" can seem like a scary term, bringing to mind images of the techno-dystopias portrayed in the Matrix, Terminator, and Black Mirror.
- Banking & Finance (0.71)
- Education > Educational Setting > Higher Education (0.55)
- Information Technology > Services (0.51)
- Health & Medicine > Health Care Providers & Services (0.35)
How Do Machine Learning Programs "Learn"?
In this article, we look at two machine learning (ML) techniques, Naive Bayes classifier and neural networks, and demystify how they work. With all the hype surrounding self-driving cars and video-game-playing AI robots, it's worth taking a step back and reminding ourselves how machine learning programs actually "learn". In this article, we look at two machine learning (ML) techniques–spam filters and neural networks–and demystify how they work. And if you're not sure what machine learning even is, read about the difference between artificial intelligence, machine learning, and deep learning. One common machine learning algorithm is the Naive Bayes classifier, which is used for filtering spam emails.
- Education (0.62)
- Information Technology (0.44)
UPDATED: Machine learning can fix Twitter, Facebook, and maybe even America
Chris Nicholson co-founded Skymind and Deeplearning4j, the most popular deep-learning framework for Java. Quitting Twitter is easy -- I've done it a hundred times. Someone called it "a clown car that drove into a gold mine," and like all clown cars, Twitter makes the passengers get out once in awhile. If I go back, it's because I'm addicted. For an information junkie, that little bubble is hard to resist.
- North America > United States > Wisconsin (0.05)
- North America > United States > Pennsylvania (0.05)
- North America > United States > Michigan (0.05)
- (7 more...)
Machine learning can fix Twitter, Facebook, and maybe even America
I've done it a hundred times. Someone called it "a clown car that drove into a gold mine," and like all clown cars, Twitter makes the passengers get out once in awhile. If I go back, it's because I'm addicted. For an information junkie, that little bubble is hard to resist. But Twitter -- and Facebook, for that matter -- is desperately broken in ways that alienate users, spread hate and endanger us as a species.
- North America > United States > Wisconsin (0.05)
- North America > United States > Pennsylvania (0.05)
- North America > United States > Michigan (0.05)
- (7 more...)
- Information Technology > Services (0.96)
- Government > Regional Government > North America Government > United States Government (0.95)
- Media (0.71)